Search CORE

13 research outputs found

Parallel Processes in HPX: Designing an Infrastructure for Adaptive Resource Management

Author: Amatya Vinay Chandra
Publication venue: LSU Digital Commons
Publication date: 01/01/2014
Field of study

Advancement in cutting edge technologies have enabled better energy efficiency as well as scaling computational power for the latest High Performance Computing(HPC) systems. However, complexity, due to hybrid architectures as well as emerging classes of applications, have shown poor computational scalability using conventional execution models. Thus alternative means of computation, that addresses the bottlenecks in computation, is warranted. More precisely, dynamic adaptive resource management feature, both from systems as well as application\u27s perspective, is essential for better computational scalability and efficiency. This research presents and expands the notion of Parallel Processes as a placeholder for procedure definitions, targeted at one or more synchronous domains, meta data for computation and resource management as well as infrastructure for dynamic policy deployment. In addition to this, the research presents additional guidelines for a framework for resource management in HPX runtime system. Further, this research also lists design principles for scalability of Active Global Address Space (AGAS), a necessary feature for Parallel Processes. Also, to verify the usefulness of Parallel Processes, a preliminary performance evaluation of different task scheduling policies is carried out using two different applications. The applications used are: Unbalanced Tree Search, a reference dynamic graph application, implemented by this research in HPX and MiniGhost, a reference stencil based application using bulk synchronous parallel model. The results show that different scheduling policies provide better performance for different classes of applications; and for the same application class, in certain instances, one policy fared better than the others, while vice versa in other instances, hence supporting the hypothesis of the need of dynamic adaptive resource management infrastructure, for deploying different policies and task granularities, for scalable distributed computing

Louisiana State University

What does fault tolerant Deep Learning need from MPI?

Author: Amatya Vinay
Daily Jeff
Siegel Charles
Vishnu Abhinav
Publication venue
Publication date: 01/01/2017
Field of study

Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

arXiv.org e-Print Archive

Crossref

The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization: invited

Author: Agostini Nicolas Bohm
Amatya Vinay
Castellana Vito Giovanni
Curzel Serena
Ferrandi Fabrizio
Limaye Ankur
Manzano Joseph
Minutoli Marco
Tumeo Antonino
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Dermoscopic diagnosis of ashy dermatosis: A retrospective study

Author: Amatya
Chakrabarti
Errichetti
Sonthalia
Vinay
Vázquez-López
Publication venue: 'Medknow'
Publication date: 01/01/2019
Field of study

Crossref

Accelerating Data Processing at the Edge with Extreme Specialization

Author: ANKUR LIMAYE
ANTONINO TUMEO
CHENG TAN
ISMET DAGLI
JOSEPH MANZANO
MARCO MINUTOLI
NICOLAS BOHM AGOSTINI
SERENA CURZEL
VINAY AMATYA
VITO GIOVANNI CASTELLANA
Publication venue
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators

Author: Agostini Nicolas Bohm
Amatya Vinay
Castellana Vito Giovanni
Curzel Serena
Dagli Ismet
Das Anup
Ferrandi Fabrizio
Limaye Ankur
Manzano Joseph
Minutoli Marco
Song Shihao
Tan Cheng
Tumeo Antonino
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability of domain-specific accelerators, which struggle to support every new machine learning (ML) algorithm advancement, clearly highlighting the need for a tool to quickly and automatically transition from algorithm definition to hardware implementation and explore the design space along a variety of SWaP (size, weight and Power) metrics. The software defined architectures (SODA) synthesizer implements a modular compiler-based infrastructure for the end-to-end generation of machine learning accelerators, from high-level frameworks to hardware description language. Neuromorphic computing, mimicking how the brain operates, promises to perform artificial intelligence tasks at efficiencies orders-of-magnitude higher than the current conventional tensor-processing based accelerators, as demonstrated by a variety of specialized designs leveraging Spiking Neural Networks (SNNs). Nevertheless, the mapping of an artificial neural network (ANN) to solutions supporting SNNs is still a non-trivial and very device-specific task, and completely lacks the possibility to design hybrid systems that integrate conventional and spiking neural models. In this paper, we discuss the design of such an integrated generator, leveraging the SODA Synthesizer framework and its modular structure. In particular, we present a new MLIR dialect in the SODA frontend that allows expressing spiking neural network concepts (e.g., spiking sequences, transformation, and manipulation) and we discuss how to enable the mapping of spiking neurons to the related specialized hardware (which could be generated through middle-end and backend layers of the SODA Synthesizer). We then discuss the opportunities for further integration offered by the hardware compilation infrastructure, providing a path towards the generation of complex hybrid artificial intelligence systems

Archivio istituzionale della ricerca - Politecnico di Milano

Towards On-Chip Learning for Low Latency Reasoning with End-to-End Synthesis

Author: Agostini Nicolas Bohm
Amatya Vinay
Castellana Vito Giovanni
Curzel Serena
Ferrandi Fabrizio
Fiorito Michele
Limaye Ankur
Manzano Joseph
Minutoli Marco
Tumeo Antonino
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

SO(DA)^2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk)

Author: Agostini Nicolas Bohm
Amatya Vinay
Castellana Vito Giovanni
Curzel Serena
Li Ang
Limaye Ankur
Manzano Joseph
Minutoli Marco
Tan Cheng
Tumeo Antonino
Publication venue: OASIcs - OpenAccess Series in Informatics. 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server